Image stabilisation system and method

ABSTRACT

A system for stabilising video signals generated by a video camera which may be mounted in an unstable manner includes a digital processing means for manipulation of each incoming image that attempts to overlay features of the current image onto similar features of a previous image. A mask is used that prevents parts of the image that are likely to cause errors in the overlaying process from being used in the calculation of the required movement to be applied to the image. The mask may include areas where small movements of the image have been detected, and many also include areas where image anomalies including excess noise have been detected. The invention also discloses means for dealing with wanted movements of the camera, such as pan or zoom, and also discloses means for dealing with the edges of video signals as processed by the system. A method is also disclosed.

This invention relates to a system and method for the stabilisation ofvideo images produced by a camera. In particular, it relates to meansfor reducing or eliminating the offset of successive images recorded bythe camera caused by undesired camera motion such as wobble.

Undesired camera motion can distort the received image, or make visualanalysis of the image hard, due to the perceived jumpiness on thereproduction apparatus. Such camera motion can be caused, for example,by wind blowing on a camera support pole, or unstable support due to thecamera being hand held, or mounted on a moving vehicle or boat etc.

One method used to stabilise the video image from a camera sufferingunwanted movement is to use a motorised camera mount that measurescamera movement caused by such things as wind buffet, and physicallymoves the camera in response. Gyroscopes may be used to detect themovement of the camera, and electric motors used to correct for it. Thisprovides, in effect, an electromechanical negative feedback loop thatattempts to keep the camera in a fixed position. This solution can bevery effective where the camera is used in surveillance applicationswhere the camera is mounted on a permanent or semi-permanent mount. Themethod is able to reduce blurring of the video image even when very slowcamera shutter speeds are used as, in a correctly stabilised system, thecamera itself is not moving. It is also unaffected by the lightingconditions of the scene, as no reference is made to the recorded videosignal, but the technique can be costly, cumbersome and may requiresignificant electrical resources, especially if the camera system islarge.

A development of this is to use sensors to detect the camera motion asbefore, but use the signals from the sensors to process the signalsproduced by the camera. The processing may involve electronicallyshifting the image with the intention of bringing it into alignment withprevious images recorded by the camera. This approach eliminates therequirement for a motorised camera mount, but the movement sensors arestill required.

Image stabilisation can be done purely by electronic processing of theimage signal. These methods consist of comparing a current imageproduced by the camera with a reference image, and spatially moving thecurrent image so as to bring it into line with the reference image.Different techniques are employed to do this.

One such technique, GB0012349.7 describes a method for stabilising thevideo images of a camera. This approach uses a global motion estimationin which explicit horizontal and vertical components are used as ameasure of how much a current image needs to be shifted to provide abest match against a reference image. These components are high-passfiltered before being input to the processing, so allowing slow cameramovements to go through to the output video signal, whereas suddenmovements are input to the stabilisation processing. This system issusceptible to errors if something in the scene at which the camera ispointing is moving, and it has no facility for coping with deliberatepanning or zooming of the camera.

According to the present invention there is provided a video imagestabilisation system that is arranged to receive one or more signalsrepresentative of a plurality of images wherein, for an image nfollowing at least an image (n−1) and an image (n−2) the system isarranged to estimate a Global Motion Offset (GMO) value between image nand a previous image representative of the spatial separation betweenthe image n and the previous image, and apply a corrective movement tothe image n based upon this GMO, characterised in that:

the system is arranged to estimate the GMO for the image n withreference to a mask that represents a region or regions of the image nthat are not to be considered in the GMO estimation, the region(s) beingregion(s) estimated as likely to mislead the calculation of the GMO.

The present invention is able to reduce the effect of unwanted cameramovements under some circumstances. It provides a system for thereduction of the effects of camera movement on the resultant videoimage. The system uses previous images in the video sequence as areference when calculating the movement of the current image.

The GMO is a measure or estimation of the distance of the current imagen from the reference image r. This distance is typically a vectorcomprising the number of pixels that n is from r, in both the horizontaland vertical planes; i.e. it indicates the shift that should be appliedto n so as to get a best match between n and r, and is calculated by anysuitable means. Note that different means for calculating this bestmatch may produce different results, not all of them being the optimumvalue. Calculating the optimum value may be too time consuming, orrequire too much computer resource, and so other techniques may beemployed that give an approximate value for the offset. The referenceimage r is preferably the previous image in the video sequence, i.e.image (n−1) as this is likely to be closest, in terms of having thelowest GMO, to the current image. Images other than the previous imagemay be used as r however, such as (n−2) or (n−5), but depending on thelevel and frequency of vibration these are likely to have larger GMOs.

Further information is used in estimating the GMO to get improvedperformance. A mask is one such piece of information. This is a binaryimage that is used to exclude pixels in the image n that may adverselyaffect the GMO calculation. The mask is preferably generated byexamining the image n for objects moving within sub-regions of theimage, and then setting the corresponding bits of the mask to excludethese areas. This is preferably done by dividing the image n into aplurality of sub-images n_(s), and calculating a Local Motion Offset(LMO) for each sub-image n_(s). The LMO may be calculated in the samegeneral manner as the GMO, but different techniques may be more suitabledue to the fact that each sub-image n_(s) is smaller than the image n.The LMO may be calculated using a corresponding sub image taken from thesame reference image r as used with the GMO, but preferably thecorresponding sub-image taken from image (n−2) is used. Again, otherreference images may be used.

The mask is preferably augmented to correspond to areas of the image nwhich are represented by pixels that do not behave with a desiredcharacteristic. These pixels may be “dead” pixels, which appear to bealways on or always off, or could be pixels that behave in an irregularmanner to the incoming light. They may also be pixels that are deemed tobe corrupted by noise above some threshold. The detection of noise maybe done in any suitable manner. Such areas are known herein as anomalousareas, and the pixels making up the areas as anomalous pixels.

The images used to calculate the GMOs and LMOs are preferablysub-sampled before the calculation takes place. This has benefits inthat the calculation effort is reduced, and the low-pass filteringinherent in the sub-sampling process makes the stabilisation system moreresilient to image noise. Local minima in the calculations are also lesslikely to be a problem. For improved accuracy, the GMO or LMOcalculations may be iterated at multiple resolutions, starting at alower resolution, generating a GMO/LMO from this, and then moving to ahigher resolution taking account of the GMO/LMO calculated in theprevious iteration. Multiple iterations at a given resolution may alsobe done before moving to a higher resolution.

The GMOs and LMOs calculated for the input images may advantageously beused to estimate whether a pan or zoom operation has been applied to thecamera. Here, pan is taken to mean movement of the camera such that itpoints in a different direction, either horizontally or vertically, orboth, and zoom is taken to mean that the focal length of the camera lensis changed such that a different field of view is seen by the camera.

Usefully, a prediction of motion offset errors can be made using theLMOs and GMOs. One such error is “lock-on”. This may occur when, forexample, the substantially the whole scene visible by a camera is takenup with a moving object. For example, if the camera were pointing at aroad scene, and a very large lorry were to pass close to the camera,then much of the image recorded by the camera may be taken up by thelorry. Without any error correction, this would give the impression of asudden pan taking place, which would cause the stabilisation routine toerroneously try to track the movement.

Another such error relates to “static camera detection”. Analysis of theGMO history is used to predict when the camera is not moving. If thisstate is detected, then the GMO vector is set to zero. Without this, theaccumulation of errors in small GMOs can lead to erroneous stabilisationto occur.

The GMO vector is preferably translated into a final stabilisationoffset (SO), which represents the vector to be applied to the currentimage after calculation and processing of the GMO vector has beencarried out. This translation preferably takes into account the motionoffset errors and estimates of pan and zoom operations. The translationpreferably involves a decay factor that tends to reduce the influence ofSOs applied to previous images. This is useful, as respective SO vectorstend to accumulate, such that an offset applied to an image mayotherwise remain even if the requirement for such an offset has ended.

A video signal that has been processed according to the currentinvention may result in the edges of the video image not being alignedwith the edge of the display area of a display device such as atelevision screen. This is as a result of the image being shiftedrelative to the display device according to any detected motion offsets.Preferably, such edges, which may be rapidly changing in position, arekept hidden from view by means of the addition of a border area betweenthe image edge and the display area of a display device. Morepreferably, the border is adapted to change size according to thedisplacement of the images relative to the display area. This may takeinto account the displacement of prior images as well as the onecurrently being displayed.

Alternatively, any blank areas around the image to be displayed causedby shifting of the image may be augmented with image information fromprior images. In this way, a full image having no artificial borders canbe presented to the display device.

As a further alternative, the image to be displayed may be expanded insize such that any blank areas are filled. This may be done by scalingthe image using known algorithms.

Note that the video signal or signals input to the system may comedirectly from a camera, or they may come from some other means such as avideo recorder or digital image storage on a computer system, or amixture of such sources.

The present invention may be implemented on a computer system, includingthose incorporating a general purpose microprocessor, and thoseincorporating a Digital Signal Processor device. A computer can beprogrammed to so as to implement an image stabilisation system accordingto the current invention.

The invention will now be described in more detail, by way of exampleonly, with reference to the following Figures, of which:

FIG. 1 diagrammatically illustrates the hardware upon which the currentinvention may be implemented;

FIG. 2 is a data-flow diagram that shows the top level operation of anembodiment of the current invention;

FIG. 3 shows in more detail the step of estimating the image motionoffsets and Local Motion Masks;

FIG. 4 shows in more detail the calculation of the GMO for each image n;

FIG. 5 shows in more detail the calculation of the LMOs for each imagen;

FIG. 6 shows in more detail the process of generating the mask image;

FIG. 7 shows in more detail the steps involved in correcting for motionoffset errors in the GMO;

FIG. 8 shows in more detail the operation of camera pan and zoomdetection;

FIG. 9 diagrammatically illustrates the use of the outer set of LMOs foran image n in the detection of a zoom operation;

FIG. 10 shows in more details the steps involved in generating a finalstabilised image given the previously calculated information; and

FIG. 11 shows the effect of the dynamic border generation on thestabilised image.

FIG. 1 illustrates a typical hardware arrangement that can be used toimplement the current invention. In this example the video signal isgenerated by a video camera 1 mounted upon a mount 2. The camera issubject to buffeting by the wind, which, if sufficiently strong, willcause the camera to wobble on the mount, as illustrated by the arrows 3.The camera supplies a video signal to the stabilisation system 4, theoutput of which is a video signal that has been processed as describedherein, which may then be displayed on a suitable display 5 or recordedon some suitable medium. The invention may be applied to many differentvideo signal formats, both digital and, with suitable digitisation,analogue; the current embodiment is set up for processing PAL and NTSCsignals.

The stabilisation system 4 carries out the processing of the signal toattempt to reduce any instability of the camera image. The system 4contains an analogue to digital converter (ADC) 6 that digitises theincoming analogue video signal. The digital signal is then fed to asignal processor 7. This processor 7 is able to perform complexcalculations and manipulations upon the incoming data stream and providean output signal that may be converted to an analogue signal by thedigital to analogue converter 8 ready for replay on the display unit 5.The processor 7 is connected to a digital framestore 10 that is able tostore the current image n from the camera, as well as the previous twoimages (n−1), (n−2). These are used in the processing. The processor 7is also connected to a general digital memory 9. This memory 9 holds theprogram that implements the current invention, as well as being used asa general storage area for data generated in connection with theoperation of the invention.

The ADC 6 digitises the incoming signal at a resolution of 720×288 (forPAL) or 720×240 (for NTSC), although only the central portion of this isused as an input to the processing algorithm, as the outer parts of theimage may represent parts of the scene that are not present in twosuccessive images due to the camera movement. Also, certain camera typescontain inactive pixels at the borders of the image area. The activearea used in the current embodiment has a resolution of 576×256 (forPAL) or 576×224 (for NTSC). Of course, the results of the processing areapplied to the whole of the digitised image.

FIG. 2 shows a top level data-flow diagram of the operation of oneembodiment of the current invention. This assumes that at least twoimages have been previously captured, and that the current image n hasbeen placed in the framestore 10. This is going to be the case for alloccasions apart from the time when the system is first switched on.

The processing relating to image n is as follows. Images n 100, (n−1)101 and (n−2) 102 are presented to module 104 which calculates the GMOand LMOs for image n, as well as a mask image. Details of how theseoffsets and mask are calculated are provided later. The image is dividedup into 6×8 (for PAL inputs) or 6×7 (for NTSC) regions for calculationof the LMOs, although of course a different number of regions may beused. Note that the mask may be referred to herein as a Local MotionMask (LMM), as its primary task is to mask out areas of the image wherelocal movement has been detected. The mask may, however, be set toinclude anomalous areas or pixels. The mask may also be a compositemask, derived from any local motion detected and anomalous areas,although it may still be referred to as an LMM.

Following calculation of the GMO, LMOs and the mask, this embodiment ofthe invention analyses these values and previous such values to checkfor motion offset errors. This is done in module 105, and produces“error corrected” offsets GMO_(EC)(n) and LMOs_(EC)(n). The data is usedto detect for lock-on errors and static camera errors. The details ofhow this is done is described in relation to FIG. 7 below.

The current embodiment then examines GMO_(EC)(n) and LMOs_(EC)(n)asmodified in module 105 to detect, and compensate for, desirable cameramovements, namely pan and zoom. This detection produces a “camera state”value, which is then used to adjust how the calculated offset is appliedto the image n. The detection is done in a camera model module 106.Details of how the pan and zoom states are detected are provided below,with reference to FIG. 8.

The state outputs of module 106 and the GMO_(EC)(n) value calculated inmodule 105 are now used to calculate the final stabilisation offset tobe applied to the image 71. This is done in module 107, which producesan output offset SO(n). This offset is passed to the display routinemodule 108, which shifts image n according to the value of SO(n), andsends the signal representative of this image to a display device orrecording medium, after first applying any borders as described below.

FIG. 3 shows in more detail the process of generating the GMO, LMOs andLMM for each image n, as used in the current embodiment. Images n, (n−1)and (n−2) 100, 101, 102 are supplied to module 104. Images n and (n−1),and the LMM calculated using image(n−1) in the previous iteration areused to estimate a GMO in step 109, based upon a known algorithmdescribed later in this specification, with reference to FIG. 4. Images(n) and (n−2) are also used to estimate the LMOs for image n, in step110. For this, the image n is divided up into an array of 6×8 (for PAL)sub-images, or local images, and an LMO estimated for each one. Thelast-but-one image (n−2) is used in the comparison as the differencebetween this and the current image is likely to be greater than if thelast image is used, leading to better detection of movement within thelocal sub-image—it has been found that local motion tends to be smallerthan global motion. The algorithm used to calculate each LMO is quitesimilar to that used to calculate the GMO, and is described in moredetail later, with reference to FIG. 5.

The LMOs generated for image n (represented as LMO(n))are used, alongwith the GMO for image n (GMO(n)), to generate a mask. The mask isgenerated in module 111. This mask is used in the estimation of GMO(n+1)and has 1 bit of information for each of the pixels in image n. If thisbit is a 0, then the corresponding pixel in the image (n+1) is not usedin the calculation of GMO(n+1). If the bit is a 1 then the correspondingpixel in the image (n+1) is used in the calculation of GMO(n+1). Themask is stored in memory 112 until it is to be used. The mask is used tomask out those areas of the image where local motion—which wouldotherwise distort the calculated GMO value—has been detected, and isalso used to mask out anomalous areas. An estimate 114 of the level ofnoise in the image is carried out, and stored 112. for later use Moredetail of the mask generation and noise estimation is provided later,with regard to FIG. 6.

FIG. 4 shows in more detail the steps taken in module 109 in thegeneration of the GMOs. The inputs to this module are the current imagen 100 and previous image (n−1) 101, the previously calculated LMM 113and the noise level estimate 114. There are three main steps to theprocedure:

In the first, the input images and the LMM are sub-sampled in modules115 to reduce their size, and hence increase the speed and noiseresilience of the operation;

The second does the main calculation of the GMO, in module 116, based onthe sub-sampled image;

The third corrects the calculated GMO, in module 117, to take account ofthe sub-sampling.

The sub-sampling of the image is done using a bilinear interpolationprocess, which will be familiar to a person skilled in the relevant art,and will not be described further herein. More details of this can befound in Sonka, Hlavac and Boyle, “Image Processing and Machine Vision”,2^(nd) edition, 1998 (PWS) Brooks/Cole Publishing. The currentembodiment sub-samples to reduce the resolution by a factor 4 in thevertical dimension and by a factor 8 in the horizontal dimension. Thishas been found to produce adequate results whilst giving a usefulreduction in computation time. Other benefits of sub-sampling include alower susceptibility to noise in the image, along with a reducedprobability of the GMO calculation being confused by local minima, asthe sub-sampling effectively low-pass filters the image.

The calculation of the GMO value involves calculating the translationthat needs to be applied to image n so as to minimise themisregistration of the image n with image (n−1), as sub-sampled. Ameasure of this misregistration is found by summing the intensitydifferences pixel-by-pixel between the images n and (n−1) (hererepresented as I_(n) and I_(n−1) for clarity), to create an error valueE.

The procedure attempts to minimise the square of this error value. Thus,E=Σ _(x,y) [I _(n)(x,y)−I _(n−1)((ƒ(x,y),g(x,y))]²   (Eqn 1).where I_(n)(x,y) represents a point x,y within image n and ƒ(x,y) andg(x,y) are transformations of the co-ordinates x and y (i.e. each pixellocation) respectively that transform the image co-ordinates of image ninto those of image (n−1). A Taylor expansion of Eqn 1 yields anequation that is conveniently analysed, and by means of differentiatingthis with respect to the parameters of the transformation functions ƒand g and setting these to zero, the resulting equation may be solved toreveal the latest update to the transformation, or GMO. Note that theinvention pre-processes the images before doing the GMO calculation toprovide a more accurate result. This pre-processing involves multiplyingon a pixel-by-pixel basis the image (as sub-sampled) with the mask (assimilarly sub-sampled), effectively reducing the active area of theimage n which has the effect of improving the accuracy and simplifyingthe calculations.

The use of a Taylor series approximation of the error equation, Eqn 1introduces approximation errors, as the current embodiment only uses thefirst order term. This can result in the process finding local minima ofthe error function, and hence providing an incorrect GMO. To reduce theeffects of the approximation, the process of calculating the GMOdescribed above is iterated. An initial motion estimate 200 of 0.0 isused, which is updated at each iteration. Each iteration uses an updatedversion of the current image n, warped, or shifted, by the latestestimate 201 of the transformation parameters (i.e the latest value ofthe current GMO being calculated)

In calculating the GMO the current embodiment also uses an annealingprocess to improve accuracy and help mitigate against errors caused byobjects moving through the scene: The annealing process decides whetherany given pixel is to be used in the current GMO calculation iteration.It does this by looking at the absolute difference between each pixel inthe image (n−1) and the corresponding pixel in the image n that has beenwarped or shifted by the current value of the GMO being calculated. Ifthis difference is greater than a threshold value then it is not used inthe next iteration. The noise level estimate 114 is used in calculatingthe threshold value. This process excludes pixels that do not line updespite having been warped—the cause of this being most likely due toimage anomalies or movement of objects through the scene. As the imagesbecome more aligned due to the iterations of the process, the thresholdchosen is decreased, as more of the pixels in the images should line upresulting in a reduced absolute difference value. This processimplements a form of robust statistical estimation. Other such robuststatistical estimation methods are known and are applicable to thecurrent invention.

The current embodiment calculates only the translation of the image nthat provides an improved registration. Hence, for this case,ƒ(x,y)=x+Δx and g(x,y)=y+Δy. Rotational and scaling errors are notcurrently considered but the invention may equally be applied, withsuitable adaptation, to stabilise a video signal derived from a sourcethat may be susceptible to rotational or scaling instabilities. Thisadaptation involves representing the functions ƒ(x,y)) and g(x,y) interms of translational, rotational and scale parameters thus:ƒ(x,y)=Δx+bx−cy and g(x,y)=Δy+cx+by,where the scaling factor=(b²+c²)^(1/2) and the degree of rotation=tan⁻¹(c/b). These equations are then solved in a similar fashion to thatdescribed above. More information on this and other aspects of thecalculation of the GMO may be found in Kent, P, “Multiresolution ImageRegistration” IEE Colloquium on Multiresolution Modelling and Analysisin Image Processing and Computer Vision, 1995, and in Kent, P,“Multiresolution Image Registration and Mosaicing” Journal of DefenceScience, Vol. 1 No. 2, the contents of both of which are hereby includedby reference. Note that these references detail a multiple resolutiontechnique, whereby the GMO value calculated at a lower resolution isthen applied to a subsequent GMO calculation performed upon an increasedresolution version of the image. This can be repeated as necessary toget the required accuracy. The current embodiment has been found to givesatisfactory results with a calculation performed at a singleresolution, but may be adapted to use multiple resolutions if required.Other methods of calculating the GMO and LMOs exist and are applicableto the current invention.

Following the estimation of the GMO as calculated above, the valueobtained is multiplied 117 by the same factors used in the sub-samplingto take account of the change in resolution of the image. Note that thecurrent embodiment uses sub-sampled versions of the image it only forthe calculation of the GMO and LMOs. All other operations work on theimage in its original resolution.

FIG. 5 shows in more detail the process used to calculate the set ofLMOs for each image n. The process is based on a simplified version ofthat used to calculate the GMO as described above. The image n is againsub-sampled as before to reduce the workload and image noise. Afterthis, the sub-sampled image is divided up, in modules 120, into 6×8sub-image blocks (for PAL), each of size 12×8 pixels. Each sub-imageblock is passed to a simplified version 121 of the GMO estimationroutine, which lacks both the masking and annealing functions. Thecalculation is done iteratively as before, using an initial motionestimate 202, which is updated at each pass, as indicated by numeral203. The masking and annealing functions are not needed due to the smallsize of each sub-image being processed. The vectors produced by module121 are then multiplied up in module 122 by the same factor used in thesub-sampling to account for the reduced resolution of the images used inthe processing. The current value of the GMO for image n, and for image(n−1) are then subtracted from the calculated LMOs values. This ensuresthat the LMO values are not corrupted by camera movements. The resultingLMOs 123 are vectors that hold the horizontal and vertical shiftrequired to get a best match between the each sub-image of n and thecorresponding sub-image of (n−2).

FIG. 6 shows in more detail the process of generating the mask. Thismask is used as described above to remove from the relevant calculationspixels connected with both local motion effects and pixels behavinganomalously. If a sub-image has an LMO greater than a given thresholdthen the region corresponding to the whole sub-image is masked off bysetting the appropriate mask bits to 0. This mask is the LMM, and iscalculated in module 125. The LMM, in this embodiment, includes maskingthat corresponds to regions of the image n in which anomalous pixelbehaviour has been detected.

The anomalous pixels are found in the following way. An imagerepresenting the absolute difference between image n, as shifted by thecurrent GMO, and image (n−1) is produced 124, i.e.I_(diff)(n)=I(n)−(n−1). The parts of the LMM mask due just to motioneffects as derived above is also used, such that I_(diff)(n) onlycomprises those sub-images where significant local motion has not beendetected.

The intensity levels of the resulting difference image are then examinedin module 126. This is done by first generating a distribution of thepixel intensities of I_(diff)(n). A threshold is then set, as the lowestdecile of this range, multiplied by 5—this factor having been chosenempirically to give a reasonable degree of robustness to motionanomalies. All pixels above this intensity are then regarded asanomalous, and so the corresponding bits in the LMM are set to a zero,in module 127, to exclude them from the relevant operations, as shown inFIG. 3. This anomalous pixel threshold is used as the noise levelestimate 114 used in the calculation of the GMO.

FIG. 7 shows a data-flow diagram of the motion offset error correctionof module 105. For this purpose, module 105 has access to the GMOvectors from the current 118 and previous images 128 and the set of LMOvectors 123 from the current image that are stored in the system memory.Module 105 first does a lock-on estimation 129 to check for distortionof the GMO caused by very large moving objects. It does this usingGMO(n), LMOs(n), and GMO(n−1) to GMO(n−25) (PAL), or to GMO(n−30)(NTSC), which have been stored in memory previously.

A lock-on is deemed to have occurred when:

i. GMO(n−1) to GMO(n−25) (or GMO(n−30) as appropriate) are all less thana given threshold value (5.0 pixels in distance is currently used);

ii. GMO(n) is greater than the given threshold; and

iii. More than 50% of the LMOs(n) are greater than a given threshold(4.0 pixels in distance is currently used).

The current embodiment only looks at the horizontal component of the GMOand LMO vectors, although it is of course possible to use either or bothcomponents. When a lock-on is deemed to have occurred, the horizontalcomponents of the GMO(n) and LMOs(n) are set to zero. This has theeffect of stopping all stabilisation effects for this axis, for thisparticular image in the sequence. Stabilisation in the vertical axis canstill occur however.

Module 105 also analyses the current GMO and the GMO recent history toestimate 130 whether the camera is actually stationary. It does this bylooking at approximately one second's worth of the GMOs, equating toGMO(n) to GMO(n−49) (for PAL signals). If all examined GMOs are lessthan a threshold—usually set to 1 pixel for both the horizontal andvertical axes—then the camera is deemed to be static. In an alternativemethod, the variance of the examined GMOs is calculated, and similarlythresholded. If the variance is below a threshold—currently set to 1.5pixels—and the one second's worth of GMOs are also less than a thresholdas described above, then the camera is deemed to be static. When astatic state is detected, the current horizontal and vertical componentof GMO(n) is set to zero, effectively disabling all stabilisation forthat image.

The error corrected offsets from module 105 are deemed GMO_(EC) 130 andLMOs_(EC) 131.

FIG. 8 shows in more detail the operation of detecting the camera state138—i.e. whether a pan or a zoom operation has taken place. In detectinga pan, the difference between two GMO_(EC) sums is calculated 133. Oneis the sum 134 of the GMOs_(EC)(n) from GMO_(EC)(1) to GMO_(EC)(n) (i.e.the accumulation of GMO_(EC) values since the system was switched on).The other 135 uses the same offsets passed through a high-pass filterbefore summation. The filter used is a second-order Bessel filter with acut-off frequency of 0.5Hz, although a person skilled in the relevantarts will realise that there are many filter characteristics that willbe suitable. More details of the calculation of this filtercharacteristic can be found in Rabiner, L. R. , and B. Gold, “Theory andApplication of Digital Signal Processing”, Prentice Hall, 1975,pp228-230. A large enough difference between these sums indicates thepresence of low-frequency global motion, typical of a pan. Note that thepan detection described above is similar to that which would be achievedby low-pass filtering the GMO_(EC) sum; however, the above method isused as the high-pass filtered values are used in other processing, andso are already present in the system memory. The effort of calculatingthe low-pass filtered values is thus saved.

If the difference between the sums exceeds a threshold (set to 50.0pixels in this embodiment), then a pan is deemed to have occurred. Untilthis threshold is exceeded, the pan detection shows a No Pan State. Thefirst time the threshold is exceeded, the pan detection shows a PossiblePan State. If this happens for several consecutive images (set to 30images in this embodiment) then there is enough evidence of a pan andthe pan detection shows a Pan State.

Once the pan detection shows a Pan State, it will continue to show a PanState until the difference between the sums does not exceed thethreshold. To smooth the transition, the pan detection will show thePossible Pan State for a few images (set to 30 images in thisembodiment) before returning to the No Pan State.

In detecting a zoom operation 136, the LMOs_(EC)(n) 132 from a group oflocal motion blocks centred around the centre of the image are examined.Typically this is a rectangular border, one block deep around the edgeof the grid of local motion blocks. This is illustrated in FIG. 9. Here,an image n, is shown, represented by the large rectangle 11. An innerportion 12, is shown divided up into a set of 6×8 local motion blocks,eg 13. A zoom-in is detected if LMOs_(EC) from selected blocks appear toshow movement generally towards the centre of the image. Likewise, azoom-out is detected if the movement is generally away from the centreof the image. The selected blocks are those on the border (e.g. 14) ofthe inner portion 12 image n. The blocks on the left and right side ofthe rectangle are examined to see if they show motion greater than somethreshold in the horizontal axis. Similarly, the blocks on the top andbottom of the rectangle are examined to see if they show motion greaterthan some threshold in the vertical axis. It will be seen therefore thatcorner blocks contribute to both the horizontal and vertical analysis.

For each block in the group, the magnitude of the motion offset iscompared against a threshold value. If a block has a motion offsetcomponent magnitude greater than a given threshold then that block isconsidered to have significant motion.

For each block that has significant motion, the direction of the motionrelative to the centre of the image is used to judge whether that motionshows a zoom-in or zoom-out. All blocks within the group are thenexamined to decide if a zoom operation is in progress. In the currentembodiment, a zoom is deemed to have occurred if the followinginequality is satisfied: $\begin{matrix}{{{Abs}\left( {N_{Z{({in})}} - N_{Z{({out})}}} \right)} \geq {\frac{1}{2}N_{B}}} & \left( {{Eqn}\quad 2} \right)\end{matrix}$where N_(Z(in)) is the number of blocks in the group indicating a zoomin; and N_(Z(out)) is the number of blocks in the group indicating azoom out, and N_(B) is the total number of blocks in the group. Ofcourse, the direction of the zoom can be found by comparison ofN_(Z(in)) and N_(Z(out)).

The inequality (Eqn 2) may in fact calculated twice, It is firstcalculated with values for N_(Z(in)) and N_(Z(out)) which include onlythose blocks where a motion offset component greater than 5 pixelsoccurs. If the inequality is then satisfied, then a “fast zoom” isdeemed to be occurring. If the inequality is not satisfied then thecalculation is repeated, this time including in N_(Z(in)) and N_(Z(out))those blocks where a motion offset component of I or more pixels occurs.If the inequality is now satisfied then a “slow zoom” is deemed to beoccurring. The reason for classifying a zoom as either a fast or slowzoom is because it has been found that better stabilisation is achievedby handling them differently. The difference in handling for the twostates is given below.

If a zoom state is detected for more than a number of consecutive images(2 for a fast zoom and 10 for a slow zoom in the current embodiment)then there is enough evidence of a zoom and the zoom detection shows azoom state.

Once the zoom detection shows a fast or slow zoom state, it willcontinue to show a zoom state until a zoom has not been detected for anumber of consecutive images (10 in the current embodiment). To smooththe transition, the zoom detection will show the possible zoom state fora few images (again, 10 in the current embodiment) before returning tothe no-zoom state.

If both a pan and a zoom are detected for a given image n, then a StateArbitration procedure 137 is used to decide which of these is the moreimportant. The procedure works by assigning a priority to each of thestat7es, with the highest priority one being acted upon, and the othersignored. The order used in the current embodiment is shown in Table 1,in order of decreasing priority: TABLE 1 1. Fast Zoom 2. Pan 3. SlowZoom 4. Possible Pan 5. Possible Zoom 6. No Pan or Zoom detected.Note that state 6 of Table 1 is selected by default if no other statesare observed.

The camera state 138 as predicted by module 106 is fed into module 107(see FIG. 2) where the stabilisation offset (SO) to be applied to imagen is calculated. The other main input to module 107 is the errorcorrected GMO_(EC)(n) 131. The detected camera state is used generate afinal offset SO(n) 139 to be applied to the image n. This is done asindicated in Table 2. TABLE 2 Detected state Final image stabilisationoffset, SO(n) 1 SO(n) = rapid decay constant × SO(n − 1) 2 SO(n) = rapiddecay constant × SO(n − 1) + HPF(GMO_(EC)(n)) 3 SO(n) = decay constant ×SO(n − 1) + HPF(GMO_(EC) (n)) 4, 5 or 6 SO(n) = decay constant × SO(n− 1) + (GMO_(EC) (n))

The decay constants in Table 2 are used to decay the accumulated imageoffset over time. This slightly reduces the effect of stabilisationwhilst improving the amount of image visible. If no camera motion isdetected the decaying offset will eventually return the image to itsinitial starting position. It is particularly useful in the situationwhere camera shake ceases but the calculated offset does not return tozero.

Also, if an embodiment of the current invention is produced that doesnot have the capability to detect and correct for pan or zoom movements(which may be done to increase processing speed and image throughput forexample), and the embodiment is inadvertently used with a panning orzooming camera, then it allows the system to work, albeit with slightlyreduced fidelity during panning or zooming operations. The rapid decayconstant currently used is 0.735, and the standard decay constant usedis 0.98.

The high-pass filter (HPF) operation used in Table 2 is the same as thatdone in the pan detection, described above.

The offset SO(n) 139 as derived above is next applied, along with thecamera state 138 to the image n to effect the stabilisation of the videosignal. This is done in two stages, as indicated in FIG. 10. The firststage 140 shifts the image n according to the vector SO(n) 139. Thisshifting of the image may result in some borders of the image area nothaving any data, and hence being blank. The blank areas in successiveimages may be of different sizes, which would result in a flickeringeffect as the rapidly moving edges of the image sequence is presented tothe output display device. The second stage of the display processtherefore is to generate “dynamic” borders 141 that cover these blankareas

The dynamic borders hide the rapidly moving edges of the stabilisedimage sequence. This is done by overlaying artificial black borders overthe edges of the shifted image. These reduce the size of the visibleimage such that the rapidly moving edges are hidden. The borderscontinually adjust to show as much of the image as possible withoutshowing the moving edges. The camera state, the stabilisation offsetSO(n) and a history of the SO values are used to determine the amount ofborder shown. When a pan or zoom is occurring or there is little imagemotion, the dynamic borders decay to show the edges of the image.Typically the border will cover an area up to the maximum excursion ofthe images detected within an offset history period of 50 images.

The offset history used for the border generation gains an entry foreach image according to the camera state:

If a Fast Zoom State, Pan State, Slow Zoom state, Possible Pan State orPossible Zoom State is detected, then the value in the offset historyfor image n is set to 0.

If a No Pan or Zoom State is detected, then the value in the offsethistory for image n is set to SO(n).

To prevent the borders changing too rapidly, the borders are limited intheir rate of change for each image. The maximum change in the verticaldirection is limited to 5 pixels per image and the maximum change in thehorizontal direction is limited to 10 pixels per image. These valueshave been found to work well, but other values could be used.

The shifted image 142, with dynamic borders applied, is then convertedto an analogue signal for replay on a monitor or recording to disk ortape. Of course, the digital signal could also be saved to a computerdisk in any convenient format.

FIG. 11 shows the border generation inaction. In a) a scene is shown atwhich a video camera, mounted on an unstable platform, is pointed.Assume that the camera is wobbling up and down. The field of view of thecamera is the non-hashed portion 143. The large rectangle represents thelarger scene 144 as seen by the camera at any time during its movementover the scene 144, over a period of a few seconds. It will be seen thatthe camera is, for this image, pointing towards the top of the scene144, hence the lower portion of the scene 145, represented by the hashedregion, is not present.

The image stabilisation system as disclosed herein, when presented witha sequence of images of which FIG. 11 a was one, would tend to move theimage 143 in towards the centre of the display area of the replaydevice, the upper and lower limit of which are here indicated by dottedlines 151. This would cause a gap at the top of the shifted frame. Whenthe dynamic border is generated, the border at the top of the stabilisedimage is made at least as large as this movement, to hide this gap.

FIG. 11 b) represents the image recorded by the camera a short timeperiod later, when the camera has moved due to its wobble and is nowpointing towards the bottom 146 of the larger scene 144. Because of thisthe top 147 of the larger scene 144 has not been recorded in this frame.Again, the stabilisation routine would tend to move this image into thecentre of the display area as represented by dotted lines 151, and thegap produced by this would also be covered when generating the dynamicborder.

As well as the above mechanism causing blank areas at the top and bottomof the stabilised image there are some areas of the larger scene 144that are visible in one image but not in another. For example, in FIG.11 a the top of a tree 148 can be seen, whereas the ground cannot.Likewise, in FIG. 11 b, the ground 149 can be seen, but the top of thetree cannot. If the borders just covered up the blank areas, then therewould still be visible a flickering region adjacent these borders causedby the scene being only visible at certain times. These are the rapidlychanging edges referred to above. To hide these, the border is extendedto cover this region, the size of which is determined by examining themaximum excursion (given by the stabilisation offset) of the image seenover the previous fifty images.

FIG. 11 c shows the resultant stabilised image, with borders 150, withinthe display area indicated by numeral 151, generated as described aboveto cover the rapidly moving edges.

An alternative embodiment tackles the moving edge problem in a differentmanner. Here, where camera movement creates a blank area in an image,image information from a previous image is used to effectively overlaythe blank area. This is done by creating a buffer image that comprisesthe current image as shifted by the offset as described above, that iswritten onto the previous image(s) in the buffer, such that itoverwrites only those parts where image data is present in the currentimage, and leaves untouched those parts in the buffer that correspond toblank areas of the current image. In this way, the buffer image growsinto an image that is larger than the display area given by the borders151, but is a composite of the current image and previous images. Thebuffer image will be the size of the larger scene 144. The part of thisbuffer image fitting within the limits of the display as given by dottedlines 151 is then output to the display or recording device, and thusensures that no dead areas or border space need be displayed.

A further embodiment expands the size of image to be displayed bylinearly scaling it such that the image covers those parts that wouldotherwise be blank due to the image shifting process.

The skilled person will be aware that other embodiments within the scopeof the invention may be envisaged, and thus the invention should not belimited to the embodiments as herein described.

1. A video image stabilisation system for correction of camera motion,that is arranged to receive one or more signals representative of aplurality of images from an image source wherein, for an image nfollowing at least an image (n−1) and an image (n−2) the system isarranged to estimate a Global Motion Offset (GMO) value between image nand a previous image representative of the spatial separation betweenthe scene imaged in image n and the previous image, and apply acorrective movement to the image n based upon this GMO, characterised inthat: the system is arranged to estimate the GMO for the image n withreference to a mask that represents a region or regions of the image nnot to be considered in the GMO estimation, the region(s) beingregion(s) estimated as likely to mislead the estimation of the GMO.
 2. Astabilisation system as claimed in claim 1 wherein the system isarranged to examine one or more local regions of the image n andcorresponding local regions of a previous image, and estimate a localmotion offset (LMO) representative of spatial separation between likefeatures in corresponding local regions of the current and previousimages, and if the, or each, LMO is greater than a given threshold, toset area(s) of the mask that correspond to this local region or regionsto indicate omission from the GMO estimation.
 3. A stabilisation systemas claimed in claim 2 wherein the local regions comprise an array ofrectangular regions.
 4. A stabilisation system as claimed in claim 1wherein the system is arranged to estimate the GMO of an imagerepresentative of image n but having a spatial resolution lower thanimage n.
 5. A stabilisation system as claimed in claim 4 wherein thesystem is arranged to iterate the estimation of the GMO on a pluralityof images each representative of image n, where each of the plurality ofimages has a different spatial resolution.
 6. A stabilisation system asclaimed in claim 1 wherein the system is arranged to adjust the GMO if astationary camera state is detected, this state being indicated by meansof a plurality of contiguous GMOs including the current GMO all beingbelow a given threshold.
 7. A stabilisation system as claimed in claim 2wherein the system is arranged to adjust the GMO if intentionaladjustment of the image source viewing direction (pan) or field of view(zoom) is detected.
 8. A stabilisation system as claimed in claim 7wherein the system is arranged to detect a pan of the image source bymeans of low-pass filtering GMO values from at least a sequence ofprevious images at a cut-off frequency lower than that expected fromunintentional camera movements.
 9. A stabilisation system as claimed inclaim 7 wherein a zoom is detected if a number x of LMOs examined forimage n all show a direction of movement in towards a central region ofthe image n, or all show a direction of movement away from a centralregion of the image n, the number x being greater than some giventhreshold.
 10. A stabilisation system as claimed in claim 9 wherein thethreshold is 50% of those LMOs examined, and the number x isproportional to the absolute difference between the number of those LMOsexamined showing a direction of movement in towards a central region ofthe image n, and those LMOs examined showing a direction of movementaway from a central region of the image n.
 11. A stabilisation system asclaimed in claim 9 wherein the LMOs examined are taken from those localregions that are substantially adjacent the edge of image n.
 12. Astabilisation system as claimed in claim 1 wherein the system isarranged to generate a border on at least one edge of the image n, theborder being adjustable in size such that it covers any blank spacebetween the edge of image n and the corresponding edge of a display areaon which the image n is displayed.
 13. A stabilisation system as claimedin claim 12 wherein the system is arranged to adjust the border size onat least one edge of the image n such that it also covers an area onimage n corresponding to blank space present on one or more previousimages.
 14. A stabilisation system as claimed in claim 12 wherein theborder generated by the system comprises of image data from one or moreprevious images.
 15. A stabilisation system as claimed in claim 1wherein the system is arranged to scale the image n, such that it coversany blank space between the edge of image n and the corresponding edgeof a display area on which the image n is displayed.
 16. A stabilisationsystem as claimed in claim 1 wherein anomalous pixels of the image n areused to set corresponding pixels of the mask such that they are excludedfrom the estimation of the GMO.
 17. A stabilisation system as claimed inclaim 16 wherein the pixels above a threshold in an image comprising theabsolute difference between the image n and a previous image m, bothimages n and m having had corrective movements applied, are regarded asanomalous.
 18. A stabilisation system as claimed in claim 1 wherein thesystem is arranged to multiply the calculated GMO, as adjusted in anyother operation, by a decay constant factor lying between 0 and 1 beforeshifting the image n.
 19. A method of stabilising a present imagerelative to at least one previous image where both current and previousimages are part of a sequence of video images represented by anelectronic signal, comprising the steps of: i. estimating a globalmotion offset (GMO) between the current and previous imagerepresentative of the spatial separation between the scene imaged in thecurrent image and that imaged in the previous image; and ii. applying acorrective movement to the current image based upon the GMO;characterised in that: a mask image is used in estimating the GMO, themask image representing a region or regions of the current image not tobe considered in the GMO estimation, the region(s) being region(s) beingestimated as likely to mislead the estimation of the GMO.
 20. A methodas claimed in claim 19 wherein the method further includes the step ofexamining one or more local regions of the current image andcorresponding local regions of a previous image, and estimating a localmotion offset (LMO) representing the spatial separation between likefeatures in corresponding local regions of the current and previousimages, and if the, or each, LMO is greater than a given threshold,setting area(s) of the mask that correspond to this local region orregions to indicate omission from the GMO estimation.
 21. A computerprogram designed to run on a computer and arranged to implement a videoimage stabilisation system, the system being arranged to receive as aninput a digital signal representative of a plurality of images from animage source wherein, for an image n following at least an image (n−1)and an image (n−2) the system is arranged to estimate a Global MotionOffset (GMO) value between image n and a previous image representativeof the spatial separation between the scene imaged in image n and theprevious image, and apply a corrective movement to the image n basedupon this GMO, characterised in that: the system is arranged to estimatethe GMO for the image n with reference to a mask that represents aregion or regions of the image n not to be considered in the GMOestimation, the region(s) being region(s) estimated as likely to misleadthe estimation of the GMO.